Achieving Privacy in the Adversarial Multi-Armed Bandit
نویسندگان
چکیده
In this paper, we improve the previously best known regret bound to achieve -differential privacy in oblivious adversarial bandits from O(T / ) to O( √ T lnT/ ). This is achieved by combining a Laplace Mechanism with EXP3. We show that though EXP3 is already differentially private, it leaks a linear amount of information in T . However, we can improve this privacy by relying on its intrinsic exponential mechanism for selecting actions. This allows us to reach O( √ lnT )-DP, with a regret of O(T ) that holds against an adaptive adversary, an improvement from the best known of O(T ). This is done by using an algorithm that run EXP3 in a mini-batch loop. Finally, we run experiments that clearly demonstrate the validity of our theoretical analysis.
منابع مشابه
Online Linear Optimization through the Differential Privacy Lens
We develop a simple and powerful analysis technique for perturbation style online learning algorithms, based on privacy-preserving randomization, that exhibits a suite of novel results. In particular, this work highlights the valuable addition of differential privacymethods to the toolkit used to design and undestand online linear optimization tasks. This work describes the minimax optimal algo...
متن کاملStochastic and Adversarial Combinatorial Bandits
This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the stochastic setting, we first derive problemspecific regret lower bounds, and analyze how these bounds scale with the dimension of the decision space. We then propose COMBUCB, algorithms that efficiently exploit the combinatorial structure of the problem, and derive finitetime upper bound on thei...
متن کاملMistake Bounds on Noise-Free Multi-Armed Bandit Game
We study the {0, 1}-loss version of adaptive adversarial multi-armed bandit problems with α(≥ 1) lossless arms. For the problem, we show a tight bound K − α − Θ(1/T ) on the minimax expected number of mistakes (1-losses), where K is the number of arms and T is the number of rounds.
متن کاملNoise Free Multi-armed Bandit Game
We study the loss version of adversarial multi-armed bandit problems with one lossless arm. We show an adversary’s strategy that forces any player to suffer K − 1− O(1/T ) loss where K is the number of arms and T is the number of rounds.
متن کاملA Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits (extended version)
We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose an efficient algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. We prov...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017